Code
::p_load(tidyverse, FunnelPlotR, plotly, knitr) pacman
Visualising Models
patriciatrisno
May 8, 2025
May 9, 2025
Funnel plot is a specially designed data visualisation for conducting unbiased comparison between outlets, stores or business entities. By the end of this hands-on exercise, you will gain hands-on experience on:
In this exercise, four R packages will be used. They are:
In this section, COVID-19_DKI_Jakarta will be used. The data was downloaded from Open Data Covid-19 Provinsi DKI Jakarta portal. For this hands-on exercise, we are going to compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan) as at 31st July 2021, DKI Jakarta.
The code chunk below imports the data into R and save it into a tibble data frame object called covid19.
Sub-district ID | City | District | Sub-district | Positive | Recovered | Death |
---|---|---|---|---|---|---|
3172051003 | JAKARTA UTARA | PADEMANGAN | ANCOL | 1776 | 1691 | 26 |
3173041007 | JAKARTA BARAT | TAMBORA | ANGKE | 1783 | 1720 | 29 |
3175041005 | JAKARTA TIMUR | KRAMAT JATI | BALE KAMBANG | 2049 | 1964 | 31 |
3175031003 | JAKARTA TIMUR | JATINEGARA | BALI MESTER | 827 | 797 | 13 |
3175101006 | JAKARTA TIMUR | CIPAYUNG | BAMBU APUS | 2866 | 2792 | 27 |
3174031002 | JAKARTA SELATAN | MAMPANG PRAPATAN | BANGKA | 1828 | 1757 | 26 |
FunnelPlotR package uses ggplot to generate funnel plots. It requires a numerator
(events of interest), denominator
(population to be considered) and group
. The key arguments selected for customisation are:
limit
: plot limits (95 or 99).label_outliers
: to label outliers (true or false).Poisson_limits
: to add Poisson limits to the plot.OD_adjust
: to add overdispersed limits to the plot.xrange
and yrange
: to specify the range to display for axes, acts like a zoom function.Observe and check part after!
Things to learn from the code chunk above.
group
in this function is different from the scatterplot. Here, it defines the level of the points to be plotted i.e. Sub-district, District or City. If Cityc is chosen, there are only six data points.data_type
argument is “SR”.limit
: Plot limits, accepted values are: 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.This graph doesn’t seem pleasing and useful isn’t it? It is hard for us the get what the plot is about moreover to understand the information!
That is a lot to be fixed!
In this part, we gonna fix the plot’s data type, axis ranges, and scaling to ensure the funnel plot accurately reflects COVID-19 fatality rates (proportions) and focuses on the relevant data range
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
Things to learn from the code chunk above. + data_type
argument is used to change from default “SR” to “PR” (i.e. proportions). + xrange
and yrange
are used to set the range of x-axis and y-axis
Now we can see what this plot is about. However looking at the plot as a whole, there are a lot of details that not properly placed or arranged, even not properly explain the data.
Here we gonna look into axis label and its placement / arrangement.
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
funnel_plot(
.data = covid19,
numerator = Death,
denominator = Positive,
group = `Sub-district`,
data_type = "PR",
x_range = c(0, 6500),
y_range = c(0, 0.05),
label = NA,
title = "Cumulative COVID-19 Fatality Rate by Cumulative Total Number of COVID-19 Positive Cases", #<<
x_label = "Cumulative COVID-19 Positive Cases", #<<
y_label = "Cumulative Fatality Rate" #<<
)
Things to learn from the code chunk above.
label = NA
argument is to removed the default label outliers feature.title
argument is used to add plot title.x_label
and y_label
arguments are used to add/edit x-axis and y-axis titles.In this section, you will gain hands-on experience on building funnel plots step-by-step by using ggplot2. It aims to enhance you working experience of ggplot2 to customise speciallised data visualisation like funnel plot.
To plot the funnel plot from scratch, we need to derive cumulative death rate and standard error of cumulative death rate.
# A tibble: 266 × 9
`Sub-district ID` City District `Sub-district` Positive Recovered Death
<dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl>
1 3172051003 JAKARTA U… PADEMAN… ANCOL 1776 1691 26
2 3173041007 JAKARTA B… TAMBORA ANGKE 1783 1720 29
3 3175041005 JAKARTA T… KRAMAT … BALE KAMBANG 2049 1964 31
4 3175031003 JAKARTA T… JATINEG… BALI MESTER 827 797 13
5 3175101006 JAKARTA T… CIPAYUNG BAMBU APUS 2866 2792 27
6 3174031002 JAKARTA S… MAMPANG… BANGKA 1828 1757 26
7 3175051002 JAKARTA T… PASAR R… BARU 2541 2433 37
8 3175041004 JAKARTA T… KRAMAT … BATU AMPAR 3608 3445 68
9 3171071002 JAKARTA P… TANAH A… BENDUNGAN HIL… 2012 1937 38
10 3175031002 JAKARTA T… JATINEG… BIDARA CINA 2900 2773 52
# ℹ 256 more rows
# ℹ 2 more variables: rate <dbl>, rate.se <dbl>
Next, the fit.mean is computed by using the code chunk below.
The code chunk below is used to compute the lower and upper limits for 95% confidence interval.
number.seq <- seq(1, max(df$Positive), 1)
number.ll95 <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll999 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
dfCI <- data.frame(number.ll95, number.ul95, number.ll999,
number.ul999, number.seq, fit.mean)
In the code chunk below, ggplot2 functions are used to plot a static funnel plot.
p <- ggplot(df, aes(x = Positive, y = rate)) +
geom_point(aes(label=`Sub-district`),
alpha=0.4) +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll999),
size = 0.4,
colour = "grey40") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul999),
size = 0.4,
colour = "grey40") +
geom_hline(data = dfCI,
aes(yintercept = fit.mean),
size = 0.4,
colour = "grey40") +
coord_cartesian(ylim=c(0,0.05)) +
annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") +
annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") +
ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
xlab("Cumulative Number of COVID-19 Cases") +
ylab("Cumulative Fatality Rate") +
theme_light() +
theme(plot.title = element_text(size=12),
legend.position = c(0.91,0.85),
legend.title = element_text(size=7),
legend.text = element_text(size=7),
legend.background = element_rect(colour = "grey60", linetype = "dotted"),
legend.key.height = unit(0.3, "cm"))
p
The funnel plot created using ggplot2 functions can be made interactive with ggplotly()
of plotly r package.